Goto

Collaborating Authors

 input unit



Optimizing Neural Networks via Koopman Operator Theory (Supplemental Material)

Neural Information Processing Systems

As discussed in Sec. 3 of the main text, the computational complexity of Koopman training is We assume that both standard training and Koopman training use simple matrix computation methods. We note that none of these factors are relevant for Koopman training. The finite section method, Eq. 4, implies the run time complexity would be The authors contributed equally 34th Conference on Neural Information Processing Systems (NeurIPS 2020), V ancouver, Canada. Koopman operator(s) and evolve each partition separately from the others. In Sec. 3, we discussed when we think this "patching" approach should give small errors.


Quantum Boltzmann Machines using Parallel Annealing for Medical Image Classification

arXiv.org Artificial Intelligence

Exploiting the fact that samples drawn from a quantum annealer inherently follow a Boltzmann-like distribution, annealing-based Quantum Boltzmann Machines (QBMs) have gained increasing popularity in the quantum research community. While they harbor great promises for quantum speed-up, their usage currently stays a costly endeavor, as large amounts of QPU time are required to train them. This limits their applicability in the NISQ era. Following the idea of Noè et al. (2024), who tried to alleviate this cost by incorporating parallel quantum annealing into their unsupervised training of QBMs, this paper presents an improved version of parallel quantum annealing that we employ to train QBMs in a supervised setting. Saving qubits to encode the inputs, the latter setting allows us to test our approach on medical images from the MedMNIST data set (Yang et al., 2023), thereby moving closer to real-world applicability of the technology. Our experiments show that QBMs using our approach already achieve reasonable results, comparable to those of similarly-sized Convolutional Neural Networks (CNNs), with markedly smaller numbers of epochs than these classical models. Our parallel annealing technique leads to a speed-up of almost 70 % compared to regular annealing-based BM executions.


SQL4NN: Validation and expressive querying of models as data

arXiv.org Artificial Intelligence

Any serious machine learning project will quickly produce a multitude of models learned from data. These models are then validated, tested in different ways, modified, retrained, and archived or deployed. This multitude of models also constitutes valuable data in itself. We may refer to such data as intensional, in contrast to what we already know as extensional data: training data, background data, data for validation, etc. Our point of view in this paper is that models are data too and should be managed using database technology, just like the extensional data. In particular, we should be able to query models. Indeed, many tasks that we usually consider as validation, analysis, explanation, verification, pruning, etc., of models [Rud19, Alb21, LAL


What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?

arXiv.org Artificial Intelligence

This paper establishes a rigorous connection between circuit representations and tensor factorizations, two seemingly distinct yet fundamentally related areas. By connecting these fields, we highlight a series of opportunities that can benefit both communities. Our work generalizes popular tensor factorizations within the circuit language, and unifies various circuit learning algorithms under a single, generalized hierarchical factorization framework. Specifically, we introduce a modular "Lego block" approach to build tensorized circuit architectures. This, in turn, allows us to systematically construct and explore various circuit and tensor factorization models while maintaining tractability. This connection not only clarifies similarities and differences in existing models, but also enables the development of a comprehensive pipeline for building and optimizing new circuit/tensor factorization architectures. We show the effectiveness of our framework through extensive empirical evaluations, and highlight new research opportunities for tensor factorizations in probabilistic modeling.


Sum of Squares Circuits

arXiv.org Artificial Intelligence

Designing expressive generative models that support exact and efficient inference is a core question in probabilistic ML. Probabilistic circuits (PCs) offer a framework where this tractability-vs-expressiveness trade-off can be analyzed theoretically. Recently, squared PCs encoding subtractive mixtures via negative parameters have emerged as tractable models that can be exponentially more expressive than monotonic PCs, i.e., PCs with positive parameters only. In this paper, we provide a more precise theoretical characterization of the expressiveness relationships among these models. First, we prove that squared PCs can be less expressive than monotonic ones. Second, we formalize a novel class of PCs -- sum of squares PCs -- that can be exponentially more expressive than both squared and monotonic PCs. Around sum of squares PCs, we build an expressiveness hierarchy that allows us to precisely unify and separate different tractable model classes such as Born Machines and PSD models, and other recently introduced tractable probabilistic models by using complex parameters. Finally, we empirically show the effectiveness of sum of squares circuits in performing distribution estimation.


Sum-Product-Set Networks: Deep Tractable Models for Tree-Structured Graphs

arXiv.org Artificial Intelligence

Daily internet communication relies heavily on tree-structured graphs, embodied by popular data formats such as XML and JSON. However, many recent generative (probabilistic) models utilize neural networks to learn a probability distribution over undirected cyclic graphs. This assumption of a generic graph structure brings various computational challenges, and, more importantly, the presence of non-linearities in neural networks does not permit tractable probabilistic inference. We address these problems by proposing sum-product-set networks, an extension of probabilistic circuits from unstructured tensor data to tree-structured graph data. To this end, we use random finite sets to reflect a variable number of nodes and edges in the graph and to allow for exact and efficient inference. We demonstrate that our tractable model performs comparably to various intractable models based on neural networks.


Scaling Continuous Latent Variable Models as Probabilistic Integral Circuits

arXiv.org Artificial Intelligence

Probabilistic integral circuits (PICs) have been recently introduced as probabilistic models enjoying the key ingredient behind expressive generative models: continuous latent variables (LVs). PICs are symbolic computational graphs defining continuous LV models as hierarchies of functions that are summed and multiplied together, or integrated over some LVs. They are tractable if LVs can be analytically integrated out, otherwise they can be approximated by tractable probabilistic circuits (PC) encoding a hierarchical numerical quadrature process, called QPCs. So far, only tree-shaped PICs have been explored, and training them via numerical quadrature requires memory-intensive processing at scale. In this paper, we address these issues, and present: (i) a pipeline for building DAG-shaped PICs out of arbitrary variable decompositions, (ii) a procedure for training PICs using tensorized circuit architectures, and (iii) neural functional sharing techniques to allow scalable training.


8a0e1141fd37fa5b98d5bb769ba1a7cc-Reviews.html

Neural Information Processing Systems

This paper aims at measuring the similarity between pairs of short texts using a neural network. Each input unit is associated with a set of words from each text. Each input unit first computes a bilinear match score from its pair of word sets. Then the rest of the network is more classical. The connectivity patterns between the network units, i.e. association of terms to input units and connection between layers comes from a multi-resolution topic model.


Overview of Autoencoders. Autoencoders are a type of neural…

#artificialintelligence

Autoencoders are a type of neural network that can be used to learn a compressed representation of a dataset. They consist of two main parts: an encoder, which maps the input data to a lower-dimensional representation, and a decoder, which maps the lower-dimensional representation back to the original dimensionality. Input layer (m input units) - Encoding layer (n hidden units) - Decoding layer (m output units) where m is the number of input units and n is the number of hidden units. The number of hidden units can be chosen based on the desired level of compression. The output of the decoder is used as the reconstructed input.